Clustering short push-to-talk segments
نویسندگان
چکیده
We present a method for clustering short push-to-talk speech segments in the presence of different numbers of speakers. Iterative Mean Shift algorithm based on the cosine distance is used to perform speaker clustering on i-vectors generated from many short speech segments. We report results as measured by the Accuracy, the average number of detected speakers (ANDS), the average cluster purity (ACP), the average speaker purity (ASP) and K . We achieve clustering accuracy of: 90.0%, 86.9% and 72.1% for 3, 15 and 60 speakers respectively.
منابع مشابه
In Proceedings of ICSLP - 96 JANUS - II : Towards Spontaneous Spanish Speech
JANUS-II is a research system for investigating various issues in speech-to-speech translations and has been implemented for speech-to-speech translations on many languages 1]. In this paper, we address the Spanish speech recognition part of JANUS-II. First, we report the bootstrap and optimization of the recognition system. Then we investigate the diierence between push-to-talk and cross-talk ...
متن کاملPublished in " Proceedings of ICSLP - 96 " JANUS - II : Towards Spontaneous Spanish Speech RecognitionPuming Zhan
JANUS-II is a research system for investigating various issues in speech-to-speech translations and has been implemented for speech-to-speech translations on many languages 1]. In this paper, we address the Spanish speech recognition part of JANUS-II. First, we report the bootstrap and optimization of the recognition system. Then we investigate the diierence between push-to-talk and cross-talk ...
متن کاملMultimodal Speaker Diarization Utilizing Face Clustering Information
Multimodal clustering/diarization tries to answer the question ”who spoke when” by using audio and visual information. Diarization consists of two steps, at first segmentation of the audio information and detection of the speech segments and then clustering of the speech segments to group the speakers. This task has been mainly studied on audiovisual data from meetings, news broadcasts or talk ...
متن کاملNTP-PoCT: a conformance test tool for push-to-talk over cellular network
Push-to-talk over Cellular (PoC) provides walkie– talkie like service in the cellular telecommunications network [9]. In this service, several predefined PoC group members participate in one PoC session. Since the PoC session is half-duplex, only one group member speaks at a time, and the others listen. Therefore, a user must ask for the permission to speak by pressing the push-to-talk button. ...
متن کاملEstimating Speaker Clustering Quality Using Logistic Regression
This paper focuses on estimating clustering validity by using logistic regression. For many applications it might be important to estimate the quality of the clustering, e.g. in case of speech segments’ clustering, make a decision whether to use the clustered data for speaker verification. In the case of short segments speakers clustering, the common criteria for cluster validity are average cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015